Overview

Dataset statistics

Number of variables19
Number of observations8578
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.2 MiB
Average record size in memory152.0 B

Variable types

Numeric9
Categorical10

Alerts

Unnamed: 0 is highly correlated with IDHigh correlation
ID is highly correlated with Unnamed: 0High correlation
NAME_FAMILY_STATUS is highly correlated with CNT_FAM_MEMBERSHigh correlation
CNT_FAM_MEMBERS is highly correlated with NAME_FAMILY_STATUSHigh correlation
Unnamed: 0 is highly correlated with IDHigh correlation
ID is highly correlated with Unnamed: 0High correlation
NAME_FAMILY_STATUS is highly correlated with CNT_FAM_MEMBERSHigh correlation
CNT_FAM_MEMBERS is highly correlated with NAME_FAMILY_STATUSHigh correlation
Unnamed: 0 is highly correlated with IDHigh correlation
ID is highly correlated with Unnamed: 0High correlation
NAME_FAMILY_STATUS is highly correlated with CNT_FAM_MEMBERSHigh correlation
CNT_FAM_MEMBERS is highly correlated with NAME_FAMILY_STATUSHigh correlation
Unnamed: 0 is highly correlated with IDHigh correlation
ID is highly correlated with Unnamed: 0High correlation
CODE_GENDER is highly correlated with FLAG_OWN_CAR and 1 other fieldsHigh correlation
FLAG_OWN_CAR is highly correlated with CODE_GENDERHigh correlation
NAME_INCOME_TYPE is highly correlated with OCCUPATION_TYPE and 2 other fieldsHigh correlation
NAME_FAMILY_STATUS is highly correlated with CNT_FAM_MEMBERSHigh correlation
OCCUPATION_TYPE is highly correlated with CODE_GENDER and 1 other fieldsHigh correlation
CNT_FAM_MEMBERS is highly correlated with NAME_FAMILY_STATUSHigh correlation
AGE is highly correlated with NAME_INCOME_TYPEHigh correlation
YEARS_EMPLOYED is highly correlated with NAME_INCOME_TYPEHigh correlation
STATUS is uniformly distributed Uniform
Unnamed: 0 has unique values Unique
ID has unique values Unique
OCCUPATION_TYPE has 294 (3.4%) zeros Zeros
YEARS_EMPLOYED has 1351 (15.7%) zeros Zeros

Reproduction

Analysis started2022-05-07 14:46:02.207852
Analysis finished2022-05-07 14:46:28.147763
Duration25.94 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

Unnamed: 0
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct8578
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean19018.26743
Minimum0
Maximum36452
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size67.1 KiB
2022-05-07T10:46:28.362602image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1905.7
Q19663
median19000.5
Q328510.75
95-th percentile35775.6
Maximum36452
Range36452
Interquartile range (IQR)18847.75

Descriptive statistics

Standard deviation10830.64959
Coefficient of variation (CV)0.5694866595
Kurtosis-1.207944468
Mean19018.26743
Median Absolute Deviation (MAD)9444
Skewness-0.04493941555
Sum163138698
Variance117302970.5
MonotonicityStrictly increasing
2022-05-07T10:46:28.557588image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
254761
 
< 0.1%
255331
 
< 0.1%
255321
 
< 0.1%
255311
 
< 0.1%
255151
 
< 0.1%
255141
 
< 0.1%
255131
 
< 0.1%
255121
 
< 0.1%
255091
 
< 0.1%
Other values (8568)8568
99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
161
< 0.1%
181
< 0.1%
191
< 0.1%
201
< 0.1%
211
< 0.1%
221
< 0.1%
281
< 0.1%
321
< 0.1%
ValueCountFrequency (%)
364521
< 0.1%
364511
< 0.1%
364501
< 0.1%
364491
< 0.1%
364481
< 0.1%
364471
< 0.1%
364461
< 0.1%
364451
< 0.1%
364441
< 0.1%
364431
< 0.1%

ID
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct8578
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5079032.727
Minimum5008804
Maximum5150473
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size67.1 KiB
2022-05-07T10:46:28.752875image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum5008804
5-th percentile5020787.8
Q15044488.75
median5078897
Q35115682.25
95-th percentile5146112.2
Maximum5150473
Range141669
Interquartile range (IQR)71193.5

Descriptive statistics

Standard deviation41866.87599
Coefficient of variation (CV)0.008243080572
Kurtosis-1.211133521
Mean5079032.727
Median Absolute Deviation (MAD)36661.5
Skewness0.05718306322
Sum4.356794273 × 1010
Variance1752835306
MonotonicityNot monotonic
2022-05-07T10:46:28.943981image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
50088041
 
< 0.1%
51051451
 
< 0.1%
51052221
 
< 0.1%
51052211
 
< 0.1%
51052201
 
< 0.1%
51051961
 
< 0.1%
51051951
 
< 0.1%
51051941
 
< 0.1%
51051931
 
< 0.1%
51051901
 
< 0.1%
Other values (8568)8568
99.9%
ValueCountFrequency (%)
50088041
< 0.1%
50088051
< 0.1%
50088231
< 0.1%
50088251
< 0.1%
50088261
< 0.1%
50088271
< 0.1%
50088301
< 0.1%
50088311
< 0.1%
50088321
< 0.1%
50088391
< 0.1%
ValueCountFrequency (%)
51504731
< 0.1%
51504671
< 0.1%
51504661
< 0.1%
51504641
< 0.1%
51504631
< 0.1%
51504591
< 0.1%
51504231
< 0.1%
51504171
< 0.1%
51504141
< 0.1%
51504121
< 0.1%

CODE_GENDER
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size67.1 KiB
0
5656 
1
2922 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row0
5th row0

Common Values

ValueCountFrequency (%)
05656
65.9%
12922
34.1%

Length

2022-05-07T10:46:29.274026image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-07T10:46:29.372604image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
05656
65.9%
12922
34.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

FLAG_OWN_CAR
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size67.1 KiB
0
5406 
1
3172 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
05406
63.0%
13172
37.0%

Length

2022-05-07T10:46:29.472244image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-07T10:46:29.560454image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
05406
63.0%
13172
37.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

FLAG_OWN_REALTY
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size67.1 KiB
1
5600 
0
2978 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row0
5th row0

Common Values

ValueCountFrequency (%)
15600
65.3%
02978
34.7%

Length

2022-05-07T10:46:29.650058image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-07T10:46:29.739724image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
15600
65.3%
02978
34.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

AMT_INCOME_TOTAL
Real number (ℝ≥0)

Distinct184
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean190041.0654
Minimum27000
Maximum1575000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size67.1 KiB
2022-05-07T10:46:29.870045image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum27000
5-th percentile76500
Q1121500
median162000
Q3225000
95-th percentile360000
Maximum1575000
Range1548000
Interquartile range (IQR)103500

Descriptive statistics

Standard deviation108333.0094
Coefficient of variation (CV)0.5700505264
Kurtosis22.28318222
Mean190041.0654
Median Absolute Deviation (MAD)49500
Skewness3.225052512
Sum1630172259
Variance1.173604092 × 1010
MonotonicityNot monotonic
2022-05-07T10:46:30.115211image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
135000997
 
11.6%
225000704
 
8.2%
157500704
 
8.2%
180000692
 
8.1%
112500675
 
7.9%
202500589
 
6.9%
270000400
 
4.7%
90000397
 
4.6%
315000239
 
2.8%
247500233
 
2.7%
Other values (174)2948
34.4%
ValueCountFrequency (%)
270003
< 0.1%
292501
 
< 0.1%
315004
< 0.1%
324002
< 0.1%
333004
< 0.1%
360001
 
< 0.1%
369003
< 0.1%
378002
< 0.1%
382501
 
< 0.1%
396002
< 0.1%
ValueCountFrequency (%)
15750002
 
< 0.1%
13500006
0.1%
11250001
 
< 0.1%
9900002
 
< 0.1%
9450001
 
< 0.1%
90000014
0.2%
8100006
0.1%
7650002
 
< 0.1%
7425001
 
< 0.1%
7200004
 
< 0.1%

NAME_INCOME_TYPE
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size67.1 KiB
4
4390 
0
2091 
1
1368 
2
726 
3
 
3

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4
2nd row4
3rd row0
4th row4
5th row4

Common Values

ValueCountFrequency (%)
44390
51.2%
02091
24.4%
11368
 
15.9%
2726
 
8.5%
33
 
< 0.1%

Length

2022-05-07T10:46:30.295490image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-07T10:46:30.387764image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
44390
51.2%
02091
24.4%
11368
 
15.9%
2726
 
8.5%
33
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size67.1 KiB
4
5754 
1
2337 
2
 
395
3
 
79
0
 
13

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row4
4th row2
5th row2

Common Values

ValueCountFrequency (%)
45754
67.1%
12337
27.2%
2395
 
4.6%
379
 
0.9%
013
 
0.2%

Length

2022-05-07T10:46:30.502595image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-07T10:46:30.600076image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
45754
67.1%
12337
27.2%
2395
 
4.6%
379
 
0.9%
013
 
0.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

NAME_FAMILY_STATUS
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size67.1 KiB
1
5832 
3
1199 
0
724 
2
 
486
4
 
337

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
15832
68.0%
31199
 
14.0%
0724
 
8.4%
2486
 
5.7%
4337
 
3.9%

Length

2022-05-07T10:46:30.711260image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-07T10:46:30.806647image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
15832
68.0%
31199
 
14.0%
0724
 
8.4%
2486
 
5.7%
4337
 
3.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

NAME_HOUSING_TYPE
Real number (ℝ≥0)

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.298671019
Minimum0
Maximum5
Zeros41
Zeros (%)0.5%
Negative0
Negative (%)0.0%
Memory size67.1 KiB
2022-05-07T10:46:30.906420image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile5
Maximum5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.972443261
Coefficient of variation (CV)0.7487987696
Kurtosis8.771831058
Mean1.298671019
Median Absolute Deviation (MAD)0
Skewness3.180293341
Sum11140
Variance0.945645896
MonotonicityNot monotonic
2022-05-07T10:46:31.038152image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
17599
88.6%
5435
 
5.1%
2289
 
3.4%
4146
 
1.7%
368
 
0.8%
041
 
0.5%
ValueCountFrequency (%)
041
 
0.5%
17599
88.6%
2289
 
3.4%
368
 
0.8%
4146
 
1.7%
5435
 
5.1%
ValueCountFrequency (%)
5435
 
5.1%
4146
 
1.7%
368
 
0.8%
2289
 
3.4%
17599
88.6%
041
 
0.5%

FLAG_WORK_PHONE
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size67.1 KiB
0
6664 
1
1914 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
06664
77.7%
11914
 
22.3%

Length

2022-05-07T10:46:31.176804image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-07T10:46:31.264787image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
06664
77.7%
11914
 
22.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

FLAG_PHONE
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size67.1 KiB
0
6077 
1
2501 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
06077
70.8%
12501
29.2%

Length

2022-05-07T10:46:31.364689image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-07T10:46:31.455420image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
06077
70.8%
12501
29.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

FLAG_EMAIL
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size67.1 KiB
0
7732 
1
846 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
07732
90.1%
1846
 
9.9%

Length

2022-05-07T10:46:31.548638image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-07T10:46:31.636289image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
07732
90.1%
1846
 
9.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

OCCUPATION_TYPE
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct19
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.131732339
Minimum0
Maximum18
Zeros294
Zeros (%)3.4%
Negative0
Negative (%)0.0%
Memory size67.1 KiB
2022-05-07T10:46:31.726511image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q16
median10
Q312
95-th percentile15
Maximum18
Range18
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.316702589
Coefficient of variation (CV)0.4727145331
Kurtosis-0.726897615
Mean9.131732339
Median Absolute Deviation (MAD)2
Skewness-0.4047553466
Sum78332
Variance18.63392124
MonotonicityNot monotonic
2022-05-07T10:46:31.866072image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
122546
29.7%
81452
16.9%
3863
 
10.1%
15831
 
9.7%
10754
 
8.8%
4498
 
5.8%
6339
 
4.0%
11303
 
3.5%
0294
 
3.4%
17162
 
1.9%
Other values (9)536
 
6.2%
ValueCountFrequency (%)
0294
 
3.4%
1131
 
1.5%
2160
 
1.9%
3863
10.1%
4498
 
5.8%
523
 
0.3%
6339
 
4.0%
719
 
0.2%
81452
16.9%
949
 
0.6%
ValueCountFrequency (%)
1841
 
0.5%
17162
 
1.9%
1631
 
0.4%
15831
 
9.7%
1417
 
0.2%
1365
 
0.8%
122546
29.7%
11303
 
3.5%
10754
 
8.8%
949
 
0.6%

CNT_FAM_MEMBERS
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.193984612
Minimum1
Maximum9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size67.1 KiB
2022-05-07T10:46:32.006316image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q33
95-th percentile4
Maximum9
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.9093973655
Coefficient of variation (CV)0.4144957812
Kurtosis1.518107344
Mean2.193984612
Median Absolute Deviation (MAD)0
Skewness0.9470341408
Sum18820
Variance0.8270035684
MonotonicityNot monotonic
2022-05-07T10:46:32.134164image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
24566
53.2%
11681
 
19.6%
31458
 
17.0%
4757
 
8.8%
5100
 
1.2%
611
 
0.1%
73
 
< 0.1%
92
 
< 0.1%
ValueCountFrequency (%)
11681
 
19.6%
24566
53.2%
31458
 
17.0%
4757
 
8.8%
5100
 
1.2%
611
 
0.1%
73
 
< 0.1%
92
 
< 0.1%
ValueCountFrequency (%)
92
 
< 0.1%
73
 
< 0.1%
611
 
0.1%
5100
 
1.2%
4757
 
8.8%
31458
 
17.0%
24566
53.2%
11681
 
19.6%

AGE
Real number (ℝ≥0)

HIGH CORRELATION

Distinct4128
Distinct (%)48.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43.23082104
Minimum21.09557349
Maximum68.86383704
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size67.1 KiB
2022-05-07T10:46:32.466013image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum21.09557349
5-th percentile26.96564611
Q133.43463589
median41.8872393
Q352.56781453
95-th percentile62.87329651
Maximum68.86383704
Range47.76826355
Interquartile range (IQR)19.13317864

Descriptive statistics

Standard deviation11.55732585
Coefficient of variation (CV)0.2673399573
Kurtosis-1.041837696
Mean43.23082104
Median Absolute Deviation (MAD)9.384176266
Skewness0.2444028127
Sum370833.9829
Variance133.5717808
MonotonicityNot monotonic
2022-05-07T10:46:32.662709image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
34.7057092216
 
0.2%
42.4895788415
 
0.2%
46.2596767913
 
0.2%
60.1436032213
 
0.2%
30.9629903412
 
0.1%
30.6809859212
 
0.1%
37.470995312
 
0.1%
36.8138976211
 
0.1%
40.2116402111
 
0.1%
62.2791706911
 
0.1%
Other values (4118)8452
98.5%
ValueCountFrequency (%)
21.095573491
< 0.1%
21.237944651
< 0.1%
21.791001871
< 0.1%
22.051103031
< 0.1%
22.056578852
< 0.1%
22.270135591
< 0.1%
22.300252571
< 0.1%
22.31120421
< 0.1%
22.333107461
< 0.1%
22.363224431
< 0.1%
ValueCountFrequency (%)
68.863837041
< 0.1%
68.830982161
< 0.1%
68.718727971
< 0.1%
68.017823771
< 0.1%
67.954851911
< 0.1%
67.94937611
< 0.1%
67.913783311
< 0.1%
67.856287261
< 0.1%
67.845335631
< 0.1%
67.782363771
< 0.1%

YEARS_EMPLOYED
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct2420
Distinct (%)28.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.956858019
Minimum0
Maximum42.87836164
Zeros1351
Zeros (%)15.7%
Negative0
Negative (%)0.0%
Memory size67.1 KiB
2022-05-07T10:46:32.857624image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11.137600361
median4.178046093
Q38.512152885
95-th percentile19.29676859
Maximum42.87836164
Range42.87836164
Interquartile range (IQR)7.374552523

Descriptive statistics

Standard deviation6.437979503
Coefficient of variation (CV)1.08076766
Kurtosis4.18633369
Mean5.956858019
Median Absolute Deviation (MAD)3.455238643
Skewness1.819572645
Sum51097.92809
Variance41.44758008
MonotonicityNot monotonic
2022-05-07T10:46:33.050885image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01351
 
15.7%
1.0979007124
 
0.3%
0.547581401420
 
0.2%
4.21363888419
 
0.2%
1.92748653318
 
0.2%
5.21297494117
 
0.2%
3.43881120117
 
0.2%
7.51555473417
 
0.2%
4.63253865616
 
0.2%
8.85439126115
 
0.2%
Other values (2410)7064
82.4%
ValueCountFrequency (%)
01351
15.7%
0.046544419122
 
< 0.1%
0.17796395551
 
< 0.1%
0.19165349051
 
< 0.1%
0.19439139751
 
< 0.1%
0.19986721152
 
< 0.1%
0.24367372362
 
< 0.1%
0.24914953762
 
< 0.1%
0.25188744461
 
< 0.1%
0.25462535165
 
0.1%
ValueCountFrequency (%)
42.878361641
 
< 0.1%
41.172645573
< 0.1%
40.759221611
 
< 0.1%
40.548402771
 
< 0.1%
40.452576032
< 0.1%
39.798216251
 
< 0.1%
39.625728113
< 0.1%
38.379980422
< 0.1%
36.72902251
 
< 0.1%
36.638671571
 
< 0.1%

STATUS
Categorical

UNIFORM

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size67.1 KiB
1
4289 
0
4289 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row0
4th row1
5th row1

Common Values

ValueCountFrequency (%)
14289
50.0%
04289
50.0%

Length

2022-05-07T10:46:33.213135image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-07T10:46:33.299434image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
14289
50.0%
04289
50.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

MONTHS_BALANCE
Real number (ℝ≥0)

Distinct61
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean27.50373047
Minimum0
Maximum60
Zeros46
Zeros (%)0.5%
Negative0
Negative (%)0.0%
Memory size67.1 KiB
2022-05-07T10:46:33.414243image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile4
Q114
median26
Q340
95-th percentile55
Maximum60
Range60
Interquartile range (IQR)26

Descriptive statistics

Standard deviation16.15204219
Coefficient of variation (CV)0.5872673238
Kurtosis-1.043031607
Mean27.50373047
Median Absolute Deviation (MAD)13
Skewness0.2170895007
Sum235927
Variance260.8884669
MonotonicityNot monotonic
2022-05-07T10:46:33.609677image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10209
 
2.4%
11196
 
2.3%
17193
 
2.2%
16190
 
2.2%
14190
 
2.2%
39189
 
2.2%
23187
 
2.2%
22183
 
2.1%
25182
 
2.1%
7179
 
2.1%
Other values (51)6680
77.9%
ValueCountFrequency (%)
046
 
0.5%
190
1.0%
2107
1.2%
3131
1.5%
4134
1.6%
5166
1.9%
6167
1.9%
7179
2.1%
8155
1.8%
9176
2.1%
ValueCountFrequency (%)
6086
1.0%
5974
0.9%
5867
0.8%
5774
0.9%
5693
1.1%
5598
1.1%
5485
1.0%
53106
1.2%
52115
1.3%
51112
1.3%

Interactions

2022-05-07T10:46:25.613627image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:11.712971image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:13.490117image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:15.325412image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:17.033180image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:18.652150image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:20.601155image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:22.292207image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:23.859977image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:25.797223image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:11.990015image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:13.682377image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:15.522526image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:17.222555image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:18.844534image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:20.793376image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:22.473658image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:24.033140image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:25.974786image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:12.177026image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:13.862086image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:15.709515image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:17.407824image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:19.096689image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:20.983950image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:22.647040image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:24.211843image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:26.167129image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:12.366274image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:14.052495image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:15.907232image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:17.596972image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:19.293647image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:21.183678image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:22.836416image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:24.395196image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:26.340787image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:12.563987image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:14.227297image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:16.088033image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:17.765465image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:19.477036image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:21.362410image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:22.999300image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:24.729548image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:26.525674image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:12.750632image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:14.448848image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:16.286872image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:17.950319image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:19.680221image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:21.554296image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:23.174612image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:24.909108image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:26.715100image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:12.944325image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:14.650141image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:16.483760image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:18.141279image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:20.068730image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:21.743810image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:23.355372image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:25.088048image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:26.883958image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:13.121678image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:14.822678image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:16.672172image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:18.311444image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:20.244375image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:21.929823image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:23.517724image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:25.253525image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:27.051872image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:13.301332image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:15.149477image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:16.846770image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:18.476663image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:20.418276image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:22.107985image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:23.684011image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-07T10:46:25.417171image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-05-07T10:46:33.814079image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-05-07T10:46:34.153902image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-05-07T10:46:34.496933image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-05-07T10:46:34.798483image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-05-07T10:46:35.040279image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-05-07T10:46:27.401029image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-05-07T10:46:27.867669image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

Unnamed: 0IDCODE_GENDERFLAG_OWN_CARFLAG_OWN_REALTYAMT_INCOME_TOTALNAME_INCOME_TYPENAME_EDUCATION_TYPENAME_FAMILY_STATUSNAME_HOUSING_TYPEFLAG_WORK_PHONEFLAG_PHONEFLAG_EMAILOCCUPATION_TYPECNT_FAM_MEMBERSAGEYEARS_EMPLOYEDSTATUSMONTHS_BALANCE
005008804111427500.0410410012232.86857412.435574115
115008805111427500.0410410012232.86857412.435574114
2165008823111135000.004110008248.6745113.26906107
3185008825010130500.042110000229.2107303.019911125
4195008826010130500.042110000229.2107303.019911130
5205008830001157500.044110108227.4639454.021985131
6215008831001157500.044110108227.4639454.021985119
7225008832001157500.044110108227.4639454.021985134
8285008839101405000.0011100010332.4222955.519621013
9325008843101405000.0011100010332.4222955.519621029

Last rows

Unnamed: 0IDCODE_GENDERFLAG_OWN_CARFLAG_OWN_REALTYAMT_INCOME_TOTALNAME_INCOME_TYPENAME_EDUCATION_TYPENAME_FAMILY_STATUSNAME_HOUSING_TYPEFLAG_WORK_PHONEFLAG_PHONEFLAG_EMAILOCCUPATION_TYPECNT_FAM_MEMBERSAGEYEARS_EMPLOYEDSTATUSMONTHS_BALANCE
8568364435149145111247500.044111008229.9855589.793493125
8569364445149158111247500.044111008229.9855589.793493128
8570364455149190110450000.041110113326.9601701.374429111
857136446514972911190000.0441100012252.2967624.711938121
8572364475149775011130500.044110108244.18160525.711685119
8573364485149828111315000.0441100010247.4972116.625735111
8574364495149834001157500.0011101111233.9144543.627727123
8575364505149838001157500.0111101111233.9144543.627727132
8576364515150049001283500.0441100015249.1673341.79332919
8577364525150337101112500.044340008125.1558903.266323113